BUPT - MCPRL at TRECVID 2009

نویسندگان

  • Zhicheng Zhao
  • Yanyun Zhao
  • Zan Gao
  • Xiaoming Nan
  • Mei Mei
  • Hui Zhang
  • Heng Chen
  • Xu Peng
  • Yuanbo Chen
  • Junfang Guo
  • Anni Cai
چکیده

This paper describes BUPT-MCPRL systems for TRECVID 2009. We performed experiments in automatic search, HLF extraction, copy detection and event detection tasks. A. Automatic search A semantic-based video search system was proposed and brief description of submitted 10 runs is shown in Table.1. Table 1 The performance of 10 runs for automatic search Run ID infMAP Description F_A_N_BUPT-MCPR1 0.104 HLF-based retrieval and positive WDSS method F_A_N_BUPT-MCPR2 0.070 Concept-based retrieval and positive WDSS method F_A_N_BUPT-MCPR3 0.059 Concept-based retrieval and positive and negative WDSS method F_A_N_BUPT-MCPR4 0.131 Combining concept lexicons of MCPR1 High-Level-Features and MCPR2 search topics and using positive WDSS method F_A_N_BUPT-MCPR5 0.032 Concept-based retrieval with example bagging method F_A_N_BUPT-MCPR6 0.024 Visual example-based retrieval F_A_N_BUPT-MCPR7 0.024 Concept-based retrieval with example weighting method F_A_N_BUPT-MCPR8 0.016 Re-rank MCPR7 with face score F_A_N_BUPT-MCPR9 0.009 Fusion MCPR6 and MCPR 7 and re-rank with face score F_A_N_BUPT-MCPR10 0.048 Fusion with MCPR5, MCPR 6 and MCPR 7 B. High-level feature extraction In this year, focus of our HLF system was on boosting and fusion of low-level features, the difference of classifiers with cross-validation, and re-ranking of results according to face detection. Table 2 HLF results and description of BUPT-MCPRL system HLF Run infMAP Description BUPT-MCPRL_Sys1 0.0313 BUPT-MCPRL_Sys3 is modified by face results. BUPT-MCPRL_Sys2 0.0487 This run fuses the results of BUPT-MCPRL_Sys3, BUPT-MCPRL _Sys4 and BUPT-MCPRL _Sys5. BUPT-MCPRL_Sys3 0.03515 This run uses nineteen models for each concept and fuse local features and global features without cross-validation. BUPT-MCPRL _Sys4 0.02255 This run uses nineteen models for each concept and just fuses local features without cross-validation. BUPT-MCPRL _Sys5 0.05995 This run uses seven models for each concept, Fuses Local and Color features with cross-validation in the training. * This work was supported by China National Natural Science Foundation under Project 60772114, 90920001. BUPT-MCPRL_Sys6 0.04835 This run uses seven models for each concept, Fuses Local and Color features without cross-validation in the training. C. Copy detection Two different retrieval algorithms based on SURF and SIFT were independently proposed to detect copy videos. D. Event detection Our event detection mainly adopted SVM models and rule-based method. 1. Automatic Search 1.1 The Proposed Framework The proposed semantic-based video system consisting of several main components, including text and visual query pre-processing, visual feature extraction, classification, multimodal fusion and results re-ranking. The framework of our search system is shown in Figure 1.1. T e x t u a l Q u e r y V i s u a l E x m a p l e Q u e r y H L F b a s e d R e t r i e v a l M u l t i m o d a l F u s i o n R e r a n k i n g b a s e d o n F a c e I n f o r m a t i o n R e t r i e v a l L i s t V i s u a l F e a t u r e E x t r a c t i o n S V M C l a s s i f i e r V i s u a l b a s e d R e t r i e v a l T e x t c o n c e p t M a p p i n g E x a m p l e c o n c e p t M a p p i n g C o n c e p t b a s e d R e t r i e v a l Q u e r y P r e p r o c e s s i n g & A n a l y s i s Figure 1.1 The framework of automatic search system From Table 1, we can find that the MCPR4 achieves the best MAP among our submitted 10 runs. Overall, our contributions are summarized as follows: Proper use of Weight Distribution based on Semantic Similarity (WDSS) method. The WDSS strategy aims to select the concepts with the most semantic similarity by parsing the lexicon of text query and the lexicon of visual concepts. Evaluation of a large number of visual descriptors. We have explored various low-level visual features at different granularities and employed a boosting feature selection method to select the most effective descriptors. Exploring different methods for making use of visual examples. Training classifiers with visual clips, bagging with example scores and weighting with selected concepts, all of these are applied in our system. Comparison retrieval strategies based on textual query and that based on visual query. Among the submitted results, the former 4 runs focus on text-concept mapping strategy, while the later 6 runs attempt to make use of topic examples. From Table 1.1, we find the text-concept mapping method performs much better than visual-based retrieval. 1.2 Building the Dataset To conquer the lack of prior knowledge, we built two concept lexicons, the one of High-Level-Features from TRECVID over the past 3 years and the other of 48 topics from TRECVID 2008 search task. The ground truth of all those concept sets was manually annotated on the Sound and Vision development data (tv7.sv.devel, tv7.sv.test), which was divided into 3 partitions, 60% of the annotated shots as training data, 20% as validation data, as well as the rest 20% as fusion data. 1.3 Feature Selection Since no unique visual feature can represent all information contained in a video, and no given visual feature is effective for all concepts, we extracted a great deal of features at local, regional and global levels, which can be divided into four categories: key-point features, texture features, edge features and color features. At the same time, in order to save computation time and improve effectiveness, a feature selection scheme based on boosting method was employed. At last, 11 low-level features with better performance were selected in our system, which were listed in the following table [1, 4, 6, 7, 8, 9]. Table 3 Selected low-level visual features Features Description SIFT + SURF Global Concatenate sift-visual-keywords and surf-visual-keywords with global partition SIFT + SURF Region 1*3 Concatenate sift-visual-keywords and surf-visual-keywords with 1*3 regional partition SIFT + SURF Region 2*2 Concatenate sift-visual-keywords and surf-visual-keywords with 2*2 regional partition SIFT + SURF Region 3*1 Concatenate sift-visual-keywords and surf-visual-keywords with 3*1 regional partition Gabor Wavelet 3-scale and 6-direction Gabor feature with 3*3 regional partition Local Binary Pattern 256 dims histogram of each LBP code with global partition Edge Directional Histogram (EDH) 145 dims histogram by concatenating global and regional EDH R_HoG Histogram of Oriented Gradient with rectangle block HSV_Correlogram Color Auto Correlogram feature with global partition RGB_BlkHist RGB color histogram with 3*3 regional partition Block RGB Moment RGB color moment with 5*5 regional partition 1.4 Visual Example-based Search In the visual example-based retrieval, these selected visual features were extracted to describe the images of topic examples. For each feature, a SVM classifier was built for each topic. The positive samples used to train the classifier were the topic examples and the negative samples were randomly sampled from negative training dataset without repetition. We used radius basis function as the kernel function and determined the parameters in a coarse-to-fine searching method by 3-fold cross-validation at the training stage. However, because of the limited number of topic examples, MAP of the example-based retrieval just reaches 0.024 (MCPR6), which needs to be further improved. This point is consistent with conclusion in [3, 5, 10]. 1.5 Concept-based Search Concept-based retrieval has played a crucial role to improve the performance in automatic video retrieval systems [11, 12]. The key problem for this module is how to use the limited number of pre-trained concept classifiers to satisfy various user’s queries. Two sets of concept lexicons were adopted in our system. The first one was the set of HLF concepts of TRECVID over the past 3 years and the second one consists of 48 search topics of TRECVID 2008. In addition, we have explored text-concept mapping strategy and example-concept mapping strategy respectively. As for text-concept mapping method, we select semantic similar concepts according to the Weight Distribution based on Semantic Similarity (WDSS) mapping method [2, 3 and 11]. First of all, query expansion with synonyms, acronyms, and stop word removal were implemented in preprocessing stage. Then, by virtue of WordNet we could calculate concept similarity to query words, and after a similarity clustering, an unfixed number of similar concepts were selected, with normalized similarity score as the concept weight. In addition, the negative WDSS method was also introduced in our system. For example, the relevant shots of topic 290 “Find shots of one or more ships or boats, in the water” should not contain concepts like “office” and “kitchen”, so that the weights of these concepts were set to negative to avoid their appearances. On the other hand, we also attempt ways for example-concept mapping strategy [3, 10]. The first method is acquiring concept weight by classifying topic examples to obtain the possibility vector. By using this vector, the concept results for each shot were linearly weighted, and finally the retrieval list was calculated. Additionally, the second strategy is a bagging strategy with example scores, in which the classifiers were trained by example score vectors and the retrieval list was computed by classifying each shot by using these detectors. From the results table, we find that the MAP of bagging strategy outperforms that of direct weighting method by 33.3%. 1.6 Multimodal Fusion and Re-ranking Strategy In our system, the linear average precision weight fusion strategy was applied for intra concept fusion while the max fusion strategy was employed as a method for multimodal fusion. The average precision could be acquired with the fusion dataset. This year, we try to use face detection information for each shot as a basis for re-ranking. For those topics relevant with human, face scores calculated by face number and size were added to the original result list. But maybe due to the unreasonable strategy, or because of the low precision of our face detection algorithm, the performance of re-ranking is not satisfactory. 1.7 Experiments and Discussions This year, we submitted 10 automatic search (type A) runs for search task and the performances are shown in Figure 1.2. Because of the unreliable face detection, MCPR8 and MCPR9 with re-ranking had our lowest MAPs of 0.009 and 0.016. As for the visual example-concept mapping strategy, the weighting method reached 0.024, while the bagging strategy improved to 0.032. With limited number of positive samples, MCPR7 using detectors trained by topic examples achieved 0.024, and MCPR10 the max fusion of all example-based retrieval had a MAP of 0.048. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 Figure 1.2 The performance of 10 submitted runs for automatic search. The red bars are from BUPT_MCPRL For the text-concept mapping method, the MAP was greatly improved than the example-based retrieval. MCPR2 (0.070) jus with positive concept weight distribution was better than MCPR3 (0.059) with both posand negative concept weights. Using the same positive weight distribution method, MCPR1 with 60 HLF concept detectors reached 0.104, advanced than that of MCPR2 by 48.5%. From this comparison, we may deduce that the larger number of concepts in lexicons and the more accurate description for each concept would result in the better retrieval performance. At last, combining lexicons of both High-Level-Features and search topics, MCPR4 achieved the best MAP of 0.131, ranked as No. 1 among all the runs, which is shown in Figure 1.3 Figure 1.3 The performance of BUPT_MCPR4 2. High-level Feature Extraction In this year, focus of our HLF system was on boosting and fusion of global features, local features and interest points, the difference of classifiers with cross-validation and re-ranking of results according to face detection. The results indicated that the performance of submitted 5 runs was not as good as last year. 2.1 Feature Representation ... ... ... ... ... ... ... ... ...

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BUPT - MCPRL at TRECVID 2011 *

In this paper, we describe BUPT-MCPRL systems for TRECVID 2011. Our team participated in five tasks: semantic indexing, known-item search, instance search content-based copy detection and surveillance event detection. A brief introduction is shown as follows: In this year, we proposed two different methods: one based on text and another is bio-inspired method. All 2 runs we submitted are descri...

متن کامل

BUPT-MCPRL at TRECVID 2010

In this paper, we describe BUPT-MCPRL systems for TRECVID 2012. Our team participated in three tasks: known-item search, instance search and surveillance event detection. A brief introduction is shown as follows: A. Known-item search This year we submitted 4 automatic runs based on two different approaches, one of which is text-based and the other is visual feature-based. Results of all 4 runs ...

متن کامل

BUPT at TRECVID 2007: Shot Boundary Detection

In this paper we describe our methodologies and evaluation results for the shot boundary detection at TRECVID 2007. We submitted 10 runs results based on SVM classifiers and several separate detectors. BUPT_01 Default SVM parameters and a low threshold for motion detector BUPT_02 Default SVM parameters and a low threshold for edge detector BUPT_03 Make high penalty for false cuts to increase th...

متن کامل

BUPT at TREC 2009: Entity Track

This report introduces the work of BUPT (PRIS) in Entity Track in TREC2009. The task and data are both new this year. In our work, an improved two-stage retrieval model is proposed according to the task. The first stage is document retrieval, in order to get the similarity of the query and documents. The second stage is to find the relationship between documents and entities. We also focus on e...

متن کامل

REGIMVID at TRECVID 2009: Semantic Access to Multimedia Data

In this paper we describe our TRECVID 2009 video retrieval experiments. The REGIMVID team participated in two tasks: High Level Feature Extraction and Automatic Search. Our TRECVID 2009 experiments focus on increasing the robustness of a small set of sensors and the relevance of the results using a probabilistic weighting of learning examples.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009